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Abstract — We conduct the most comprehensive study of 
WLAN traces to date. Measurements collected from four major 
university campuses are analyzed with the aim of developing 
fundamental understanding of realistic user behavior in wireless 
networks. Both individual user and inter-node (group) behaviors 
are investigated and two classes of metrics are devised to capture 
the underlying structure of such behaviors. 

For individual user behavior we observe distinct patterns in 
which most users are 'on' for a small fraction of the time, the 
number of access points visited is very small and the overall on- 
line user mobility is quite low. We clearly identify categories of 
heavy and light users. In general, users exhibit high degree of 
similarity over days and weeks. 

For group behavior, we define metrics for encounter patterns 
and friendship. Surprisingly, we find that a user, on average, 
encounters less than 6% of the network user population within 
a month, and that encounter and friendship relations are highly 
asymmetric. We establish that number of encounters follows a 
biPareto distribution, while friendship indexes follow an exponen- 
tial distribution. We capture the encounter graph using a small 
world model, the characteristics of which reach steady state after 
only one day. 

We hope for our study to have a great impact on realistic 
modeling of network usage and mobility patterns in wireless 
networks. 

I. Introduction 

Recently, wireless networks have been gaining popularity 
and are being deployed ubiquitously in various environments, 
especially on university campuses. With more users switching 
to wireless networks, the importance of understanding user 
behavior in such environments is increasing. First, analysis 
of user behavior and network usage patterns enables accurate 
assessment of wireless network utilization and aids in devel- 
oping better management techniques and capacity planning 
decisions. Second, usage analysis is also a necessary first 
step towards developing realistic models of usage patterns and 
mobility models that are crucial for the design and evaluation 
of wireless networking protocols. Third, as new technologies 
evolve (e.g. variants of 802.11 WLANs, or ad hoc networks), 
fundamental understanding of user behavior becomes essential 
for the successful deployment of the emerging technology. 

Although many wireless network protocols have been de- 
veloped over the past decade, the majority have been designed 
independently of the context in which they may be deployed 
and are usually evaluated using artificial (e.g. synthetic, often 



unrealistic) models. We believe that the design and evaluation 
of the next generation wireless networks should go hand- 
in-hand with deep, insightful understanding of the realistic 
environments in which they will be deployed and used. 

The main focus of this paper is to gain further understanding 
of realistic user 1 behavior (e.g. usage, mobility) utilizing the 
most extensive wireless LAN traces collected to date from four 
major university campuses. Our study is different from (but 
relates to) studies on mobility modeling [6], [7] . The WLAN 
traces provide coarse-grained location snapshots of users. In 
this work we seek to describe user behavior at access point 
(AP) or building granularity. While mobility models describe 
how users move, the WLAN trace captures the combined result 
of movement, network access and usage patterns of users. In 
that sense, one may envision that all-encompassing models 
may be built using our study (but perhaps not pure mobility 
models). Such models may be used to validate mobility models 
at a coarse-grained level. While it is difficult to collect large 
amount of raw mobility traces, existing university WLANs 
provide a very good opportunity to collect WLAN usage traces 
and enable analysis of user behavior. 

A few studies have been previously conducted on WLAN 
traces [1], [2], [3], and we do borrow from these traces and 
studies as appropriate. These studies are quite helpful, but 
each of them was conducted on a single campus, and hence it 
becomes unclear whether their findings generalize beyond the 
studied campus. Furthermore, most of these studies focus on 
metrics for individual users and building or access point (AP) 
usage. In our study, we go beyond previous work to compare 
user behavior across different campuses, which allows us to 
generalize our findings and reason about commonalities and 
differences between campuses. Also we define new metrics 
and models to study group behavior, in addition to individual 
users, to capture very important inter-user correlations. Our 
approach enables us to obtain new (sometimes surprising) 
results on the fundamental understanding of wireless network 
usage. Hence, this study is quite novel in various aspects and 
we expect it to greatly impact future research on realistic and 
trace-based modeling and analysis of wireless networks. 



'In this paper we use the terms "user", "node", and "mobile node (MN)" 
interchangeably to describe a wireless network user. 



In this paper we propose two categories of metrics to 
analyze the behavior of individual user and groups of users. 
From the individual user analysis, we observe that exact 
user behaviors are different not only due to the underlying 
environment and campus, but also because of various methods 
used for trace collection and analysis. In general, one can 
identify categories of heavy users and light users, and each 
user tends to overall use a very small subset of the APs 
on campus. In most cases we identify repetitive patterns in 
user behavior over various time frames (e.g., days, weeks). 
We further study the inter-node relationship between users 
to understand group behavior. By looking at encounters re- 
lationships among users and defining friendship indexes we 
provide various new methodologies to understand underlying 
user behaviors in wireless networks. The distribution of both 
encounters and friendship indexes are highly asymmetric, 
indicating a heterogeneous user population. Surprisingly, we 
find that a user, on average, encounters between 1.8% and 6% 
of the network user population within a month. We establish 
that number of encounters follows a biPareto distribution, 
while friendship indexes follow an exponential distribution. 
We utilize a Small World model to understand the relationship 
in the encounter graphs formed by wireless network users. To 
our surprise the metrics of the formed Small Worlds saturate 
after only one day (in a one-month trace period). Finally, we 
propose information diffusion experiments to reveal richness 
of encounter patterns. Many of our findings point to invalid 
assumptions often made in mobility modeling and simulation 
and provide guidelines for realistic modeling of user behavior 
on university campuses. 

The major contributions of this paper are: 

• By using WLAN traces from four different campuses, 
comparing the results and highlighting both similarities 
and differences, it is the largest scale trace-based study 
in the literature as far as we know. 

• By proposing metrics for describing individual MN be- 
haviors, we construct a basis on which models for indi- 
vidual MNs can be established. We also find several facts 
that indicate traditional, randomly generated synthetic 
mobility models (such as random waypoint, random walk, 
etc.) are not adequate for a heterogeneous environment 
such as university campuses. 

• By proposing new methodologies to understand relation- 
ships between MNs, we build tools and understandings 
that should be useful for future research. 

The rest of the paper is organized as follows: In section llll 
we discuss the related works. The studied wireless network en- 
vironments and trace-collection related issues are discussed in 
section|ni] We introduce various metrics to describe individual 
user behavior in-depth in section II VI After that, we study the 
relationship between MNs by designing various experiments in 
section|V] Finally, we provide some discussions and directions 
for potential future work in section IVT1 and conclude the paper 
in section IVTll 



II. Related work 

Influenced by the gaining popularity of wireless LANs in 
recent years, there are increasing interests on studying usage 
of wireless LANs. Several previous works [1], [2], [3] have 
provided extensive study on wireless network usage statistics 
and made their traces available to the research community. 
Our work is built upon these understandings and traces. 

With these traces available, more recent research works 
focus on modeling user behaviors in wireless LANs. In [4] 
the authors propose models to describe traffic flows generated 
by wireless LAN users, which is a different focus to this 
paper. In the first part of this paper we focus more on 
identifying metrics that capture important characteristics of 
user association behaviors. We understand user associations 
as coarse-grained mobility at per access point granularity. 
Similar methodology has been used in [1] and [5]. In [5] the 
authors propose a mobility model based on association session 
length distribution and AP preferences. However, there are also 
other important metrics that are not included, such as user 
on-off behavior and repetitive patterns. We add these metrics 
to provide a more complete description for user behaviors in 
wireless networks. 

Through this work, we also establish that although the 
general behavior of users are similar in previously collected 
traces, the detailed distributions vary due to differences in 
underlying user population and/or trace collection methods. 
Hence, conclusion drawn from one trace sometimes may not 
be generalized to describe all network environments. 

Recent research works on protocol design in wireless net- 
works usually utilize synthetic, random mobility models for 
performance evaluation [8], such as random waypoint model 
or random walk model. MNs in such synthetic models are 
always on and homogeneous in their behavior. Both of these 
characteristics are not observed in real wireless traces. We 
argue that to better serve the purpose of testing new proto- 
cols, we need models that capture on-off and heterogeneous 
behavior we observed from the traces. 

There are few work on understanding relationship between 
users in wireless LANs in current research literature. However, 
such understanding is crucial for further research on the next 
generation (socially-aware or context-aware) protocols. In the 
second half of this paper we take a first step toward this end by 
proposing new methodologies and experiments to understand 
relationship between MNs in wireless LANs. 

III. Target environment and trace collection 

METHODS 

In this study we mainly focus on wireless traces collected 
from university campuses. We obtain wireless traces from four 
different universities, including totally over 12,000 distinct 
users and over 1,300 APs. To our best knowledge this is the 
most extensive study of user behavior in wireless networks so 
far. 

The traces have been collected in the studied campuses 
using different trace-collection methods. We summarize the 



TABLE I 
Statistics of studied traces 



Trace 
source 


Unique 
users 


Unique 
APs 


Unique 
buildings 


Trace 
duration 


User type 


Environment 


Trace collection 
method 


Analyzed part 
in this paper 


Users in 
analyzed part 


Labels used 
in graphs 


MIT[1] 


1,366 


173 


3 


Jul. 20 '02 to 
Aug. 17 '02 


Generic 


3 Engineer 
buildings 


Polling 


Whole trace 


1,366 


MIT-cons 
MIT-rel 


















Jul. 2003 


2.518 


Dart-03 


Dartmouth[3] 


10,296 


623 


188 


Apr. '01 to 
Jun. '04 


Generic 


Whole 
campus 


Event-based 


Mar. 2004 


5,416 


Dart-04 
Dart-rel 
Dart -cons 


UCSD[2] 


275 


518 


N/A 


Sep. 22 '02 to 
Dec. 8 02 


PDA only 


Whole 
campus 


Polling 


Sep. 22 '02 to 
Oct. 21 '02 


275 


UCSD 


use 


4,548 


79 
ports 


73 


Dec 03-Now (trap) 
Apr 20 05-Now (detail) 


Generic 


Whole 
campus 


Event-based 


Apr. 20, '05 to 
May. 19 '05 


4,528 


use 



important characteristics of these traces in Table U and explain 
the major issues below. 

The focus of the paper is on understanding the behavior of 
mobile nodes (MNs), including association to APs, mobility, 
repetitive pattern, encounter, and friendship. These four traces 
are chosen to represent different campus environments, user 
populations, location granularity, and trace-collection methods. 
We study the differences and similarities of user behavior 
in these traces, and try to attribute them to the underlying 
differences in the traces as appropriate. In order to make the 
results we get below comparable between traces, we only 
analyze selected one-month chunks from the longer Dartmouth 
and UCSD traces. All these traces, except UCSD trace, collect 
measurements of generic wireless network users with various 
devices, including but not limited to laptops, PDAs, and VoIP 
devices [3]. UCSD trace is from a specific study about PDA 
users. All the traces, except MIT trace, are collected from the 
entire campus wireless network. MIT trace is collected from 
three engineering buildings, hence its user population is not 
as diverse as the other traces, and the geographic scope of 
trace collection is smaller. USC trace is the only one that has 
coarser, per switch port location granularity (approximately 
correspond to buildings on campus), while the others have 
per AP location granularity. These distinctions in underlying 
environments may lead to differences in the metrics discussed 
below. 

The methods of collecting wireless network traces can 
be categorized into two major categories: (i) Polling-based 
methods which record the association of MNs at periodic time 
intervals, using SNMP [1] or association tracking software 
on the MNs [2], and (ii) Event-based methods which record 
MN online/offline events using logging server (e.g. syslog) 
[3]. It is generally accepted that event-based approach pro- 
vides more accurate records of MN behavior in the network. 
However, there is no in-depth study to quantify the differences 
between these two approaches. In order to further understand 
the effects of using different methods of trace collection on 
the trace obtained, we also obtain a re-constructed polling 
trace as follows: For an event-based trace, we observe the 
trace at regular time intervals and emulate what would be 
recorded if the trace is taken by polling-based method. We 
then process the re-constructed polling trace as we do to a 



normal polling-based trace, and compare the findings with the 
corresponding findings from the original event-based trace. We 
use March 2004 Dartmouth trace (Dart-04) to carry out this 
experiment, obtaining Dart-cons and Dart-rel traces based on 
the conservative and relaxed assumptions detailed below. 

For traces using polling-based approach, duration of associ- 
ation must be derived from the observations made at constant 
intervals based on an important assumption of association 
duration for each observed data point. We test two different 
assumptions in this aspect: (a) A conservative (MIT-cons, 
Dart-cons) approach, in which a MN is assumed with the 
AP only until the next expected polling (recording) epoch, 
unless indicted otherwise by new samples in the trace. This 
approach reflects what is observed from the trace faithfully, but 
may have the drawback that inaccuracy in polling intervals or 
lost SNMP records will lead to the conclusion that the MN is 
switching between online and offline status while it has been 
always on. (b) A more relaxed approach (MIT-rel, Dart-rel), in 
which a MN is assumed with the AP for four polling intervals 
after it is observed with the AP, unless indicted otherwise by 
the trace. This approach is more robust to disturbances in trace 
collection, however, it may erroneously increase the duration 
of association with APs after a MN is in fact offline. We use 
only the relaxed approach to UCSD trace. 

IV. Analysis of individual user behavior 

In this section we use metrics to describe and compare 
behaviors of individual users (or MNs) in the studied environ- 
ments. These metrics can be divided into four major categories 
as follows: (a) Activeness of users: This category captures the 
frequency of user participation in network activity. In general 
wireless network users are not always on, but show up in the 
trace intermittently, (b) The long-term mobility of users: This 
category captures how widely a MN moves in the network 
in the long run (i.e., for the whole duration of the trace), 
and how MNs online time is distributed among the APs. The 
intention here is to capture the tendency for a MN to visit 
various locations in the studied environment, (c) The short- 
term mobility of users: This category captures how MNs move 
in the network while it keeps associated with some AP. The 
intention here is to capture the mobility of a MN while using 
the wireless network, (d) The repetitive association pattern of 




Online fraction of time 
Fig. 1 . CCDF of online time fraction 

users: This category captures the user on-off behavior with 
respect to time of the day and the location. We expect users 
tend to show repetitive structure in their association patterns 
and propose network similarity index (NSI), a quantitative 
metric to capture such repetitive pattern in user behavior. 

Before presenting the analyses using the above metrics we 
first introduce some terminologies: 

• Online event is defined as the event when a MN asso- 
ciates itself with an AP, while it is not associated with 
any AP right before the online event. 

> Offline event is defined as the event when a MN disasso- 
ciates itself with the current AP to which it is associated, 
and does not associate itself with any other AP right after 
the disassociation event. 

• Handoff event is defined as a MN changes its association 
from one AP to another with no time gap in between (i.e., 
by issuing a re-associate at the second AP). 

• Association session is defined as the duration between an 
online event to the next offline event for the same MN. 
Handoff events do not terminate an association session. 

• Total online time is the sum of time periods a MN 
associated with any AP throughout the studied trace. 

• Existence time is the time difference between a MN's 
first online event and its last offline event in the studied 
trace. It is a conservative measure of the time duration 
for which the MN is a potential user of the network. 

A. The activeness of the users 

Activeness of users is the first aspect we look into in attempt 
to compare the different traces. Activeness of users can be 
captured by either total online time fraction of a MN or the 
number of association sessions generated by a MN. 

Online time fraction is not straight forward to measure due 
to users joining or leaving the network. We choose to define 
the online time fraction as the ratio between MN's total online 
time to its existence time. We plot the CCDF of online time 
fraction of users in various traces in Fig. Q 

From Fig.^we observe that in all traces only a small protion 
of users are always on. Even for the most active Dartmouth 
trace (Dart-04), there are only less then 30% always-on users. 
These observations argue strongly that most users have on-off 
usage patterns, where some of the users are heavy users (with 
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Fig. 2. CCDF of number of association sessions by users 

high on time) while many are light users. The distributions of 
the on/off times seem to depend heavily on the environment 
(i.e., campus). UCSD trace, which focused only on PDA users, 
is the least active one among all traces. The other traces (MIT, 
USC, Dart-03) are not very different in online time fraction 
distribution. The activeness of MNs increased from 2003 to 
2004 in Dartmouth trace, which agrees with the findings 
in [3]. By comparing the curves of Dart-04, Dart-rel, and 
Dart-cons, we observe that online time fraction is consistent 
for the same trace under different trace collection (or trace 
reconstruction) methods. Hence, this metric is insensitive to 
the tracing method. 

We further compare the CCDF of number of association 
sessions generated by users in these traces in Fig. [2] We 
observe that the PDA users in UCSD trace generate more 
association sessions than users in other traces, which are 
generic wireless network devices (mainly laptop users) during 
comparable trace duration. This fact, together with the less 
online time fraction in Fig. ^ indicates that PDA users are 
more likely to use the devices for shorter but more frequent 
sessions. From the figure we also observe that count of 
association sessions is sensitive to the trace collection method. 
While the two original Dartmouth traces (Dart-04 and Dart- 
03) show similar distributions, the re-constructed traces (Dart- 
rel and Dart-cons) show very different distributions form the 
original Dart-04 trace, since traces collected by polling at 
regular intervals will overlook association sessions shorter than 
the polling interval. Another technical difficulty here is to 
adequately translate a record seen in polling-based traces to 
the duration of association. As we compare the curves of MIT- 
cons and MIT-rel, we find them drastically different. A closer 
investigation reveals that in the MIT trace although SNMP 
polling intervals are typically 5 minutes apart, sometimes 
records of MN association are obtained at longer intervals, and 
this leads to bogus terminations and re-initiation of association 
sessions if the conservative assumption is used, leading to the 
high association session counts shown by curve MIT-cons. 

On-off behavior is very common for wireless users. This 
seems especially true for small handheld devices. There are 
clear categories of heavy and light users, the distribution of 
which is skewed and heavily depends on the campus. The 
'number of sessions' metric is sensitive to the trace collection 
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Fig. 3. CCDF of coverage of users 

method, while the 'online time fraction ' is insensitive to such 
method. 

B. The long-term mobility of users 

In this section we capture the long-term mobility of users 
by obtaining the overall statistics of AP association history 
during the whole trace period. We investigate the number of 
APs a user associates with and the fraction of online time it 
associates with each of the APs. 

We define coverage of a user as percentage of APs it 
associates with during the trace period, which is the number of 
APs the node have associated with over the total AP number 
in the trace. For USC trace we use switch ports (approximately 
corresponding to buildings) in place of APs. The distributions 
of coverage of users in the traces are shown in Fig. [5] This 
metric captures how widely a user moves for the whole period 
of trace in the studied network environment. 

We observe that users have small coverage in all environ- 
ments. None of them have a user visiting more than 40% of 
all APs. The MIT trace focuses on only three buildings, hence 
the relative coverage of users is much higher. In UCSD trace, 
the PDA users seem likely to visit a larger portion of campus 
than the generic users do in the other campus-wide traces, 
seemingly due to the portability of PDAs. Coverage seems to 
remain stable with respect to time change (compare Dart-03 
and Dart-04), but it is sensitive to the trace collection method 
since the polling-based method overlooks short sessions and 
under-estimates the coverage metric. However, different re- 
construction methods of the polling-based trace (con, rel) 
result in the same coverage, as the metric counts the number 
of APs a MN associates with, not the association duration. 

In [1] the authors define prevalence as the fraction of online 
time a MN spends associated with each of the APs during 
the trace period. We follow that definition and compare the 
distribution of prevalence across different traces as follows: 
In order to understand how a user distribute its total online 
time among the APs it has association with, we order the APs 
by the prevalence value for each the MN, and take average 
of prevalence values across all MNs for the same AP ranking 
to get the curves showing average association time fraction in 

Rg. a 



Fig. 4. Average fraction of time a MN associated with APs. For each MN, 
the AP list is sorted based on prevalence values before taking average 

From Fig. |4] we observe that for all environments, the 
general trend is that each user has very few APs at which it 
spends most of its online time. In particular, for all the traces 
on average a MN spends more than 65% of its online time 
with one AP, and more than 95% of online time at as few as 
5 APs. The left-end of the curves are similar, but the tail varies. 
The higher mobility of PDA users in UCSD trace translates 
into a longer tail, where in addition to those few most frequent 
APs, the users also access the wireless network at much more 
locations with small time fraction as compared to other traces. 
This metric is robust to different trace collection methods and 
assumptions of trace translation, as the curve for Dart-04 is 
close to Dart-cons or Dart-rel. Same for MIT traces. 

Individual users access only a very small portion of APs in 
the network, less than 40% in all campuses. The long-term 
mobility of users displays strong skewness of time associated 
with each AP. On average a user spends more than 95% of 
time at its top five most visited APs. 

C. The short-term mobility of users 

In this section we study the per-association session mobility 
of a user, which reflects their short-term mobility. This cap- 
tures a different dimension of a user as the previous section: 
How mobile the user is while using the network? We use 
handoff statistics as a measure of user mobility while using 
the network, looking at both distributions of the total number 
of handoffs and the average number of handoffs per association 
session of each user. We plot the curves in Fig. [5] and |6] 
respectively. 

Our first intuition is that user mobility should be dependent 
on the device type, and PDAs in UCSD trace should display 
higher mobility than users in other traces. However, as shown 
in both figures, the UCSD trace does not show more handoff 
counts than other traces. On the contrary, it is among the least 
in average handoff per association session. This may be related 
to the fact that PDAs are usually used for short durations, 
hence experience less handoff events. The exact number of 
handoff count depends heavily on the network environment. 
In USC trace, the coarse location granularity directly leads to 
smaller handoff counts. On the other hand, Dartmouth traces 
have much more average handoff counts per session than the 
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Fig. 5. CCDF of total handoff count per MN 
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Fig. 6. CCDF of average handoffs per association session 

other traces. 

We also observe that handoff counts in Fig. [5] are sensitive 
to the trace collection method, as curve for Dart-04 differs sig- 
nificantly from Dart-rel and Dart-cons. This is again because 
the polling-based method overlooks short association sessions 
and hence many hand-off events are not captured. 

From Fig. |6] we also observe that less than 20% of users 
have more than 10 handoffs per association session in all 
traces. This implies that most users are relatively stationary 
while using wireless devices. Also, the handoff statistics 
presented here are subjected to ping-pong effect [3], referring 
to excessive handoff events due to disturbance in wireless 
channels while the MN itself might be stationary. Hence, we 
expect the actual short-term mobility of users is even lower 
than the results we get from the traces directly. 

The majority of users experience low mobility while using 
the network. This is even true for portable devices such as 
PDAs. The actual handoff statistics depend heavily on the 
environment. 

D. The repetitive association pattern of users 

Naturally user behavior changes with respect to time of the 
day and day of the week, as people follow daily and weekly 
schedules in their lives. In some cases, user association pattern 
repeats itself day to day or week to week. In this section we 
try to quantify such repetitive pattern by defining the network 
similarity index (NSI) below. 

We start the definition with location similarity index for 
individual users. First we take snapshots of associated APs 
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Fig. 7. Network similarity indexes. The peaks represent intervals for which 
there is high similarity. 

of the user every 1 minute. To study the tendency of the 
user showing repetitive behavior after a certain time gap (for 
example, every 24 hours), we consider all snapshot pairs that 
are separated by this time gap, and calculate "the fraction of 
all such pairs where the user is associated with the same AP in 
both snapshots". This is an indication of how likely this user 
re-appears at the same location after the chosen time gap. We 
consider only those snapshots that fall within the existence 
time of the user. Network similarity index (NSI) at a given 
time gap is the average of location similarity index for all 
users at this time gap. 

In Fig. we show the NSI for all the traces. In most of 
these traces we observe obvious higher network similarity 
index if the time gap is integer multiples of a day. This is 
an indication that users have the strongest tendency to show 
repetitive association pattern at the same time of each day. It 
is also interesting to observe that for USC, MIT, and Dart-03 
traces, the network similarity index for the gap of 7 days (i.e., a 
week) is the second highest, only slightly lower than that of for 
the gap of 1 day. This indicates weekly repetitive pattern is also 
strong in these traces. On the other hand, UCSD trace shows 
little repetitive pattern as there is almost no obvious spikes 
in its NSI curve. This can be attributed to its user population 
being PDA users. Unlike laptops, which are more related to 
work, PDAs are usually used in a more casual way in short, 
scattered durations. Hence it is expected that PDA users show 
less repetitiveness in their usage pattern. 

The "average value" of the NSI curves reflects the fraction 
of users that always stay at the same location. In that sense, we 
see that Dartmouth trace has the most stationary users. This 
may be attributed to the fact that Dartmouth traces include 
users in student dormitories, which are mainly stationary 
users and have high location similarity indexes. USC has not 
deployed WLAN in dormitories yet, and MIT trace is mainly 
focused on buildings for work. 

For Dart-04 traces we observe significantly higher values 
in NSI curve. Another interesting distinction between Dart- 
04 trace and the others is that Dart-04 trace does not show a 
second peak in network similarity index at 7-days gap. Instead, 
network similarity index decreases as time gap increases. A 
closer investigation into Dartmouth University calendar reveals 



that they switch from winter quarter to spring quarter in the 
middle of March. We suspect that the decreasing NSI with 
respect to time gap might be attributed to people deviating 
from normal daily/weekly schedules at the end or beginning 
of a quarter. 

We observe clear repetitive patterns of association in 
wireless network users. Typically, user association patterns 
show the strongest repetitive pattern at time gap of one day 
and the second strongest at one week. 

V. Relationship between nodes 

In addition to individual user behavior studied in the pre- 
vious section, observing relationship between MNs is also 
important to understand the characteristics of the traces. In 
this section we first investigate the distributions of encounters 
between MNs. Then we propose metrics to capture closeness 
(e.g. friendship) among MNs. Considering encounters as a 
way to build up relationship between MNs, we study how 
MNs form a relationship network via encounters by defining 
encounter- relationship graph (ER graph) to observe clustering 
and degree of separation among MNs and contrast it with 
the SmallWorld model [11]. Finally, we carry out simulations 
to test the potential of information dissemination based on 
encounters alone. All these experiments serve as vehicles to 
help us further understand the underlying structures of these 
traces. 

A. Encounters between Nodes 

Nodal encounters in mobile network are important events 
as they provide opportunities for involved nodes to build up 
some relationship or to communicate directly. Here we define 
an encounter event as the duration of two MNs associate 
with the same AP during overlapping time intervals. The 
wireless LAN traces provide sequences of AP (switch port for 
USC trace) association history for MNs in the network. We 
can derive when MNs encounter with each other by simply 
comparing individual association traces. The distribution of 
these encounter events is the first step to understand the 
structure of MN relationship in the traces. The direct questions 
to ask about the encounter events are: What is the proportion 
of other nodes a typical node meet? Do nodes meet with each 
other repeatedly or not? 

Fig. |8] shows the CCDF of fraction of MNs a given MN has 
encountered through the whole trace period. From the figure 
we observe that all the nodes encounter only less than 40% of 
the user population within a month, with UCSD trace being 
the only exception. This may be partly due to the fact that the 
275 PDA users in UCSD trace were all selected from freshman 
class, and they tend to stay in several common dorms as stated 
in [2]. In all the other traces, on average a MN encounters 
with only 1.88% (Dart-03) to 5.94% (USC) of the whole user 
population within the 30-day trace period. The rather limited 
population of encounter stems naturally from the fact that a 
MN will not visit a large portion in the network, as shown in 
previous section in Fig. [3] and different MNs have different 
locations to visit on regular basis. 
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We also show the CCDF of the total encounter events 
a MN has throughout the trace period in Fig. [9] Again 
we observe significant difference in total encounter counts, 
another evidence of heterogeneous behavior among MNs. The 
actual number of total encounters depends on the size of 
population in the traces. Large traces (i.e., USC and Dartmouth 
traces) tend to have more encounters than small traces (i.e., 
UCSD trace). However, regardless of the trace population, 
the curves for total encounter count seem to follow BiPareto 
distribution [9]. We try to fit BiPareto distribution curves to the 
empirical distribution curves, and use Kolmogorov-Smirnov 
test to examine the quality of fit. The resulting D-statistics for 
all traces are between 0.068 and 0.024, which indicates we 
have a reasonably good fit between the BiPareto distribution 
curves and the empirical distribution curves. For details about 
the Kolmogorov-Smirnov test and the parameters of fitted 
curve, please refer to appendix A. 

A closer investigation into unique encounter count and total 
encounter count of the same MN reveals that high unique 
encounter count does not always imply high total encounter 
count, as shown in Fig.^|using USC trace as an example. The 
correlation coefficient between unique encounter count and to- 
tal encounter count is only 0.585 in this example. This implies 
some node pairs have many repetitive encounters, suggesting 
closer relationship between such node pairs than others. In the 
next section, we introduce the notion of friendship between 
MNs to further understand such phenomenon. 

In all the traces, the MNs encounter a relatively small 
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fraction of the user population; below 40% in most cases and 
never reaching above 60% in any case. Except for UCSD 
trace, on average a MN only encounters 1.88%-5.94% of 
the whole population. The number of total encounters for 
the users follows a BiPareto distribution, the parameters of 
which depends on the campus. 

B. Friendship between Nodes 

In our daily lives, we are bound to meet with colleagues 
and friends much more often than others. In this section we 
try to investigate using the wireless LAN traces whether such 
uneven distribution of closeness among MN pairs exists, and 
try to measure it using the concept of friendship dimensions. 
The likelihood for encounters to occur for two given MNs 
captures the friendship between them. This "friendship" in 
WLAN trace may or may not reflect social friendship, which is 
impossible to validate from anonymized traces. We propose to 
identify friendship between MN pairs based on three different 
dimensions: Encounter duration, encounter count, and location 
diversity of encounter, with the following definitions: 

• Friendship based on encounter time: We define 
friendship index based on duration of encounter as 
Frd t (A,B) = E t (A,B)/OT(A), which is the ratio of 
sum of encounter durations between node A and B, 
E t (A,B), to total online time of node A, OT(A). This 
is an index for how good a friend node B is to node 
A based on duration of encounters. Note that in general 
Frd t (A,B) ^ Frd t (B,A) and 0.0 < Frd t (A, B) < 
1.0 for any node pair A and B. 

> Friendship based on encounter count: The friend- 
ship index based on encounter count is defined as 
Frd c (A, B) = E C (A,B)/S(A), which is the ratio 
between association sessions of node A that contains 
encounter events with node B, E C (A, B), to total asso- 
ciation session count of node A, S(A). 

• Friendship based on encounter location diversity: The 
friendship index based on location diversity of encounter 
is defined as Frdi(A, B) = Ei(A, B)/L(A), which is 
the ratio between number of locations at which node A 
has encounters with B, Ei(A, B), to total locations node 
A visits, L(A). 



The above three dimensions can be used to understand 
friendship between MNs from different perspective. Different 
relationships between MNs may lead to various friendship 
index value in these dimensions. For example, two users may 
have high friendship index based on encounter time and en- 
counter count, but not from the location diversity perspective, 
and so on. 

We first observe how friendship indexes distribute among all 
ordered node pairs in the campuses studied. As shown in Fig. 
ITT1 the CCDF curves of friendship indexes based on encounter 
time follow exponential distributions for all campuses. Again 
we use Kolmogorov-Smirnov test to examine the quality of 
fit. The resulting D-statistics for all traces are between 0.0356 
and 0.0052, which indicates we have a reasonably good fit 
between the exponential distribution curves and the empirical 
distribution curves. Please see appendix A for a brief introduc- 
tion to Kolmogorov-Smirnov test and the detailed parameters 
of fitted exponential distribution curves. In spite of the fact that 
the traces are collected from different campuses with different 
methods, the shape of distribution curves of friendship index 
remain unchanged. We observe higher friendship index for the 
USC trace because the associations are measured at coarser 
granularity at switch port level, hence it is more likely for 
MNs to encounter one another. 

To understand the effect of different trace collection meth- 
ods on friendship index, we compare the distribution curve 
from Dart-04, Dart-rel and Dart-cons. The reconstructed traces 
tend to omit association sessions that are shorter than the 
sampling period, hence under-estimate the total online time 
for nodes, OT(A), leading to slightly larger friendship indexes. 

Exponential distribution of friendship index is an indication 
that majority of nodes do not have tight relationship with one 
another. In all the traces, only less than 5% of ordered node 
pairs (A,B) have friendship index Frdt(A, B) larger than 
0.01. This reveals the fact that in addition to limited fraction 
of nodes with encounter events as shown in the previous 
section, even for node pairs that do encounter with each other, 
most of them do not show tight relationship. Among all node 
pairs with non-zero friendship index, only 4.47% of them 
has friendship index larger than 0.7, and another 11.85% of 
them with friendship index between 0.4 to 0.7. Friendship 
indexes based on encounter frequency or location diversity of 
encounter also show similar exponential distributions. 

We next look into the issue of whether friendship index 
for an ordered node pair Frdt(A, B) and its reversed tuple 
Frd t (B, A) are symmetric. We plot the friendship index 
based on time for all node pairs with non-zero encounter 
duration in Fig. [fusing MIT trace as an example. The scatter 
diagram shows that friendship indexes are highly asymmetric 
for each node pair. From Table HU we can see that for all 
three dimensions to define friendship indexes, the correlation 
coefficient between ordered node pair (A,B) and (-8,^4) 
are mostly low in all traces, implying high asymmetry in 
friendship indexes. 

Friendship between MNs is highly asymmetric. The dis- 
tribution for the friendship index is exponential for all the 
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traces, regardless of the friendship definition (based on time, 
encouner, or location). Among all node pairs there are less 
than 5% with friendship index larger than 0.01, and less 
than 1% with friendship index larger than 0.4. 

C. Encounter-relationship graph 

In section IV-AI we see that MNs have small percentage 
of unique encounters among the whole population. Given 
this fact, we raise a question regarding the possibility of 
establishing campus-wide relationships among majority of 
MNs via encounters alone. That is, do encounters link MNs 
on campus into one single community, or just small pieces of 
cliques? 

To investigate this question, we define a static encounter- 
relationship graph (ER graph) as follows: Each MN is rep- 
resented by a node in the ER graph, and an edge is added 
between two nodes if the two corresponding MNs have en- 

TABLE II 

Correlation coefficient for friendship indexes for all traces 



Trace name 


Friendship index based on 


encounter time 


encounter count 


location diversity 


MIT-rel 


0.415 


0.327 


0.186 


UCSD 


-0.024 


-0.004 


-0.003 


use 


0.158 


0.205 


0.130 


Dart-03 


0.351 


0.278 


0.043 


Dart-04 


0.629 


0.201 


0.068 



countered at least once during the studied trace period. The 
concept of ER graph is introduced to capture potential for 
establishing relationships based on direct encounters. 

We use three important metrics to describe the characteris- 
tics of encounter-relationship graphs, defined as follows: 

• Clustering coefficient (CC) is used to describe the 
tendency of nodes to from cliques in the graph. It is 
formally defined as: 

(~1f~l £ nod<ii = l CC(i) 

°° - M 

where CC(i) = ^Tw^S-T^ 

!(■) is the indicator function, Nbr(i) is the number of 

neighbors node i has, N(i) is the set of neighbors of 

node i, and M is the total number of nodes in the graph. 

Intuitively, clustering coefficient is the average ratio of 

neighbors of a node that are also neighbors of one 

another. 

• Disconnected ratio (DR) is used to describe the connec- 
tivity of ER graph. It is defined as: 



E, 



deBjC(A) 



M(M-l) 

where C(A) is the set of nodes that are in the same 
connected sub-graph with node A. 
Average path length (PL) is used to describe the degree 
of separation of nodes in the ER graph. It is formally 
defined as: 

PL = (1 - DR) ■ PL con + DR ■ PL dlsc 

Where PL con is the average path length among the 

connected part of the ER graph, defined as: 

£ li^iL.M.W) PL(A.B) 



£ nodeA = l £ nodtsBeCIA) * 



PL(A, B) is the hop count of the shortest path between 
node pair (A, B). PLdi SC is the penalty on average path 
length for disconnected node pairs in ER graph. In the 
following we use the average path length of regular 
graphs (defined later) with the same node number and 
average node degree for PL disc- 
We study how the above metrics evolve for the ER graphs 
derived from various studied period of WLAN traces. Taking 
USC trace, Dartmouth trace (Dart-04), and UCSD trace as 
examples, we show the evolution of the three metrics with 
respect to various studied trace periods in Fig.[0](a)-(c). The 
graphs for other traces are not shown here due to limited space, 
but they also show very similar trends. The additional graphs 
are available in appendix B. To highlight a unique property 
of these ER graphs, we also calculate CC and PL for regular 
graphs and random graphs with the same corresponding total 
node number M and average node degree d. In regular graphs, 
nodes are first arranged on a circle and each node is connected 
to d closest neighbors on the circle. In random graphs, d 
randomly chosen nodes are assigned as neighbors for each 
node. Typically, regular graphs have high CC and PL while 
random graphs have low CC and PL. 

From Fig. ^] (a) we note that the ER graphs for generic 
wireless users (USC and Dart-04) have low DR, which implies 
that nodal encounters are sufficient to provide opportunities to 
connect almost all nodes in a single community, even though 
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Fig. 13. Change in the ER graph metrics with respect to trace period 



some of them are online for only short durations. This is an 
encouraging result that points out the feasibility of building a 
large, campus-wide relationship network relying only on direct 
encounters. For UCSD trace DR is higher. We suspect this is 
due to the total user number is comparitively small (only 275 
PDA users). We further point out that although DR starts out 
very high with very short trace period, as MNs have not moved 
around to create encounters yet, it decreases rather quickly as 
trace period increases. Within one day, DP's reduces to within 
less than 3% difference of its corresponding final values. 

Another interesting finding is revealed by taking a further 
look at the other two metrics, clustering coefficient (CC) 
and average path length (PL). In Fig. ^] (b), we show the 
normalized CC's and PL's for various trace periods. These 
normalized metrics are obtained by the following equations: 

CCnorm — T^n ZZnn 

^-0.99 + 0.01 



PL r 



CC rcB -CC ra 
PL-PL ranl 



-0.99 + 0.01 



PL r eg-PL r 

where CC n0 rm and PL norm represent normalized CC and 
PL, respectively. The subscripts reg and rand imply the 



corresponding metric is obtained from the reguler graph and 
random graph, respectively, with the same total node number 
and average node degree. These normalized metrics represent, 
on the scale from 0.01 (correspond to the random graph) to 1 
(correspond to regular graph), where do the metrics obtained 
from the ER graphs fall. 

We observe that ER graphs display high CC's which are 
close to those of corresponding regular graphs (i.e., Normal- 
ized CC's being close to 1), and low normalized PL's which 
are close to those of corresponding random graphs. This high- 
lights that a special pattern of encounters exists in all network 
traces: Nodes having the same home AP are highly likely 
to encounter with all others and introduce highly connected 
clusters among these nodes, leading to high CC. Some of 
the nodes in one cluster also have random encounters with 
nodes in other clusters, and those links serve as "shortcuts" in 
the ER graphs that reduce PL. In previous literature, graphs 
with high CC close to regular graphs and low PL close to 
random graphs are referred as SmallWorld graphs [11], [12]. 
By looking at various traces, we indicate that the relationship 
formed by encounters among nodes using wireless network 
is also an instantiation of SmallWorlds. We also observe that 
both PL and CC converges to its final values rather quickly in 
about one day for USC and Dart04 traces. Although number 
of nodes in ER graph keep increasing as studied trace period 
increases, as shown in Fig. ^] (c), it does not change these 
metrics a lot. 

Encounters link most of the MNs together in a connected 
graph, albeit each MN encounters only with small portion of 
the whole population. The encounter graph is a SmallWorld 
graph, and even for short time period its clustering coefficent, 
average path length, and connectivity are all close to those 
for longer traces. 

D. Encounter-relationship graph with friends 

In the previous section the ER graph is constructed by 
including all encounters to construct links between nodes. Typ- 
ically, a MN may maintain relationship selectively only with 
those MNs that are considered "trust- worthy". For example, a 
MN may choose to trust those MNs with which it has high 
friendship indexes. The criteria of choosing the nodes to keep 
a relationship may influence the structure of ER graphs. This 
issue is the main focus of investigation in this section. 

We defined the metrics for friendship in wireless networks 
in section IV-BI Now we try to include friends with various 
degree of closeness in the ER graph, and see how it influences 
the structure of the graph. We use friendship index based on 
time as an example to show how choosing encounters with 
different degree of closeness can change the structure of ER 
graph significantly. 

We sort the list of nodes that a node A has encountered 
according to friendship index, Frdt(A, B), where B is a node 
that encountered A at least once. After sorting, each node 
decides to pick a certain percentage of nodes from the list 
with which to establish relationships. We choose nodes from 
top, middle, or bottom of the list and with various percentages, 



and obtain the corresponding metrics for the new ER graphs 
that include only the links to the chosen nodes. 

In this case, one minor modification needs to be made to 
the metrics introduced earlier. Since friendship indexes are 
asymmetric as shown in section IV-Bl it is possible that node 
A has chosen to include node B in the ER graph, but not 
vice versa. Hence the modified ER graph becomes a directed 
graph instead of an undirected one. The definition of clustering 
coefficient is^modified as follows: 

(~<r< S nodei=l CC{i) 

00 — M 

where CC(i) = ^^H^X^ 

!(■) is the indicator function, Frd(i) is the number of 
friends node i chooses to include in the graph, F(i) is the 
set of chosen friends of node i, and M is the total number 
of nodes in the graph. Note that A G F(B) does not imply 
B € F(A). 

When calculating average path length and disconnected 
ratio, the paths must follow the direction of edges on the graph. 

Following the modified definitions, we obtain the metrics 
when including given percentages of all encountered nodes 
from the top, middle, or bottom of the sorted encounter node 
list according to friendship index based on time. The figures 
are shown in Fig. We use USC trace with 30-day trace 
duration as an example, and similar results are also observed 
in other traces. 

The figures show a clear trend that if neighbors ranked high 
in friendship index are included, the resultant graph shows 
stronger clustering, and the average path length is much higher. 
The result stems from the fact that top friends of a given 
node are also likely to be top friend between one another, 
forming small cliques in the graph. Clustering coefficient 
remains high due to these cliques. Disconnected ratio and 
average path lengths are high due to the lack of links between 
different cliques. On the other hand, when low-ranked friends 
are included in the graph, the links included are distributed 
in a more random fashion, reflected by the low clustering 
coefficient and low average path length. Similar results are 
also observed in social science study of friendship between 
pupils [13]. As larger portion of friends are included in the 
graph, all three metrics move closer to the value in section 
IV-CI when all encounters are included. 

We further perform the same experiment using the other 
two friendship index, the corresponding results are shown in 
appendix C for briefness. 

Top-ranked friends tend to form cliques and low-ranked 
friends are the key to provide random links and reduce the 
degree of separation in encounter graph. 

E. Information diffusion using encounters 

In addition to establishing relationship between nodes, en- 
counters can also be utilized to diffuse information throughout 
the network. In this model, information is spreaded with nodal 
mobility and encounters, where nodes exchange information 
when they encounter each other directly. The speed and 
reachability of information diffusion among the nodes are 
determined by the actual patterns and sequences of encounters. 
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In this section we seek to answer the question of whether the 
current encounter patterns between MNs in wireless networks 
are rich enough to be utilized for information diffusion. If the 
answer is yes, what is the delay incurred is such information 
diffusion scheme, and how robust is it? 

As a first step to understand the problem, we use the sim- 
plest diffusion mechanism. We assume sufficient bandwidth 
and reliable communication between MNs, and sufficient 
storage space on all MNs. When a source node has information 
to send, it simply transmits it to all nodes it encounters if 
they have not received the information yet. All intermediate 
nodes cooperate in information diffusion, keeping a copy of 
received information and forwarding it the same way as the 
source node does. This simple approach is known as epidemic 
routing in the literature [15]. Under perfect environment with 
sufficient resources, it achieves lowest delay and highest 
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Fig. 15. Receive ratio of broadcast messages using epidemic routing 



Fig. 16. Information delivery ratio with various selfish node percentage and 
trace period 



delivery rate. Note that the delivered information may not take 
the shortest path in terms of transmission hop count, as all 
nodes simply take the earliest available chance to propagate 
the information. In this work we have chosen epidemic routing 
to show the potential of encounter-based information diffusion 
under realistic encounter patterns. We further investigate the 
robustness of the information diffusion scheme. 

In the following simulations, we use a traffic pattern in 
which the source node has some information it wishes to 
send to all other nodes. The source starts to "diffuse" the 
information when it is first online. As time evolves, nodes 
encounter with each other and increasing portion of the whole 
population receive the information. We study the percentage 
of nodes that received the information with various trace 
periods and show the results in Fig.^J using USC trace as an 
example. Each point in the figures of this section is an average 
value for using 30% of the nodes that appear earliest in the 
corresponding trace period as sources. 

From Fig. ^] we observe that even within a short trace 
period (i.e., one day) the information can be diffused to around 
89% of the whole population. As the trace period increases, 
reachability also improves. Given that most nodes are online 
for part of the trace period (Fig. 0, visit only small portion 
of the whole campus (Fig. QJ, and encounter a small portion 
of the whole population (Fig.|8|l, this result is perhaps beyond 
our original expectation. It gives a positive confirmation that 
it is potentially possible to deliver information relying only 
on encounters, in a campus environment with high success 
rate, under current user behavioral pattern. As the population 
of wireless computing devices and their average online time 
both increase in the future, we can expect to have even higher 
delivery rate. This subject bears futher research. 

In some cases, a portion of nodes may not be cooperative 
to propagate the information, especially for a diverse user 
population as in university campuses. To understand how 
uncooprative users potentially influence the feasibility of in- 
formation diffusion, we carry out the following experiment: 
We make a portion of users selfish such that it never forwards 
information for other sources, and we study the performance 
degradation under this setup. For each of the trace periods 
used, we increasingly make a certain percentage of nodes 
selfish, starting from those with highest unique encounter 
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counts. By making nodes with high unique encounters selfish 
first, we eliminate more transmission opportunities than pick 
selfish nodes randomly, hence we expect to observe greater 
impact on performance. The relationship between percentage 
of selfish node and the information delivery ratio is shown 
in Fig. ^] The result is very surprising: For all trace period 
tested, the unreachable ratio does not increase significantly 
before at least 20% of nodes are selfish. The performance 
is even more robust if we take longer period of trace. This 
implies that even a significant portion of users are not willing 
to propagate information for others, the underlying nodal 
encounter pattern is rich enough for the information to find 
an alternative way through. Hence delivery rate is quite robust 
for up to an intermediate percentage of selfish nodes. It is 
also interesting to observe that for trace period longer than 
15 days, the reachability curves almost overlap. This indicates 
that for trace periods longer than 15 days, few new encounters 
are introduced between node pairs. Most of encounters are 
simply repetitive ones between same pairs of nodes, hence 
longer trace period do not help to further improve robustness 
by introducing new potential paths. 

We further show how average delay of information diffusion 
changes with increasing selfish node percentage in Fig.^1 In 
the figure, average delay increases for longer trace duration 
because information that is not deliverable in shorter trace 
periods become deliverable. More interestingly, for all tested 
trace durations, average delay does not increase significantly 



before more than 40% of nodes are selfish. This implies 
average delay is also robust against selfish user behavior up 
to an intermediate percentage. 

In the current traces, encounters patterns are rich enough 
to support information diffusion. Specifically, information 
can be delivered to more than 94% of users within two 
days. The reachability and average delay do not decrease 
significantly until at least 20% — 40% of nodes are selfish. 

VI. Discussions and future work 

In this section we discuss about the insights we gained from 
this work and point out potential future research directions. 

• Modeling of users has always been an important research 
problem in all type of networks. In section II VI we 
propose metrics that fall in four categories to describe 
user behavior in WLAN traces. Among these, we em- 
phasize that user on-off behavior, small coverage, and 
repetitive patterns are important features but they are 
largely overlooked by earlier work on mobility modeling 
[8], [7] and wireless network simulation. Previous works 
on wireless network user modeling [5], while attempted to 
preserve user association durations, also did not attempt 
to preserve those important features. In the future we 
would like to work on a model that uses the dimensions 
proposed in section I1VI to describe users in wireless 
LANs. 

• As a comparison between trace-based models and syn- 
thetic mobility models, we argue that low encounter 
percentage as show in section IV-AI is not observed in 
any of the simulation scenarios used for performance 
evaluation in the literature. In typical synthetic mobility 
scenarios, all nodes follow the same model to make 
movement decision, albeit with randomness, and eventu- 
ally encounter with all of other nodes [16]. The encounter 
pattern from real wireless network trace reflects that 
university campus is a heterogeneous environment rather 
than a homogeneous one constructed by synthetic mobil- 
ity models. To better understand how protocols perform in 
such heterogeneous environment, using synthetic models 
would not be sufficient. 

• Although it is not possible to establish the exact reason 
behind the closeness of some MN pairs, this information 
may be utilized in several applications, such as better 
algorithms for cluster-forming in ad hoc networks, or 
finding a node to temporarily store a packet with higher 
probability to deliver it later to the final recipient. Pro- 
tocols that are aware of social relationship among MNs 
may be an interesting direction in the future. 

> Generally, in social-relationship aware mechanisms, one 
tends to trust top-ranked friends more than the others. 
However, as we see in section IV-DI using top-ranked 
friends only results in an ER graph with high clustering 
coefficient and average path length, and may lead to a 
disconnected relationship network. In order to remain 
connected to a larger community, one should also use 



some randomly-chosen users (or middle friends) to re- 
duce the degree of separation in underlying ER graph. 
> The information diffusion model we proposed (in Sec- 
tion lV-El can be related to several new research directions. 
Recently, the delay tolerant network (DTN) paradigm has 
been proposed to deliver messages in highly dynamic 
networks where network partitions are frequent [14]. In 
order to reduce overhead, an important issue in DTN 
protocols is to select the next hop intelligently. In the 
future, we intend to work on a message-forwarding 
mechanism based on our analysis of encounter patterns. 
Our robustness and delay analysis of information diffu- 
sion over encounter graphs show two interesting points: 
(1) For message delivery, the delivery ratio and delay 
are not affected significantly, even if we can not choose 
the shortest paths due to non-cooperative users. (2) On 
the other hand, it would be difficult to prevent diffusion 
of harmful or malicious messages, such as computer 
worms or viruses from propagating through encounters. 
Both observations are due to the richness in underlying 
encounter pattern providing multiple chances for message 
delivery. 

VII. Conclusion 

In this paper we study the wireless network traces from four 
different university campuses collected by various methods, 
with focus on different user populations. To the best of our 
knowledge, this is the most comprehensive study to date on 
wireless LAN traces in the literature. 

We first propose metrics to describe individual user behav- 
ior, pointing out important common features in all studied 
traces that are not emphasized by previous works. Wireless 
network users on university campuses are characterized by 
large percentage of offline time, limited visited APs on the 
campus, and repetitive association patterns. We believe that 
these metrics capture important characteristics about users in 
wireless networks and should be included in user modeling 
for more accurate performance evaluation in the future. We 
also find the detailed distributions are different from studied 
traces, due to the difference in underlying user population and 
trace collection methods. 

We further study the relationship between MNs in these 
traces. We find that encounters and friendship are asymmetri- 
cally distributed among all MNs, indicating that the user pop- 
ulation is a heterogeneous one. Using encounter-relationship 
graph, we establish that it is possible to create a campus-wide 
community based solely on nodal encounters. The relationship 
graph of such a community can be described using SmallWorld 
graphs. Finally, we use epidemic routing with trace-based 
encounter pattern to show that information diffusion is feasible 
in current wireless networks, and its performance is robust to 
an intermediate proportion of selfish users. 

The metrics and experiments we use in the paper provide a 
further step towards understanding of user behavior in wireless 
network, which provide good foundation for future research. 
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Appendix A. BiPareto distribution and 
Kolmogorov-Smirnov test 

In this section we first briefly introduce Kolmogorov- 
Smirnov test and biPareto distribution, and then list the detail 
numerical results of using biPareto distribution curves to fit 
total encounter distributions obtained in section IV-AI 

BiPareto distribution is first used in [9] to fit the number of 
connections per user TCP session and mean connection inter- 
arrival time in a TCP session. Later, BiPareto distribution is 
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Fig. 18. Illustration of D-statistics and K-S test 



again used in [10] to fit the distribution of association session 
length in wireless LAN. In this work, we use it to fit the 
distribution of encounters a MN has in WLANs. The CCDF 
of BiPareto distribution is as follows: 

Prob{X >x) = (f r Q (f±f) Q ~' 3 , x > k 
Prob(X > x) = 1, x < k 

The left part of CCDF curve of BiPareto distribution on log- 
log scale is a straight line with slope —a. As the x variable 
comes close to the turning point, c, the slope of the CCDF 
curve gradually changes from —a to — f3. In our study of total 
encounter distributions, we choose k = 1 for all curves. 

Kolmogorov-Smirnov test is used to determine whether the 
hypothesized distribution (in our case, the BiPareto ditribution) 
adquately fit the empirical distribution. K-S test is not sensitive 
to the binning of data set, unlike Chi-square test. Therefore 
we choose K-S test in our study. 

Referring to Fig. ED m K-S test the distances between the 
hypothesized distribution and the empirical distribution are 
measured at all x values, and the maximum of the measured 
distances is called D-statistics. More formally, D-statistics is 
defined as: 

D n = sup x [\ F n (x) - F (x) |] 

where F n (x) and Fq(x) are the empirical and hypothesized 
distributions, respectively. Intuitively, D-statistics measure the 
maximum difference between the two distribution curve. A 
smaller D-statistic indicates a better fit of the hypothesized 
distribution to the empirical distribution. 

We use minimum squared error method to find the best fit 
of BiPareto distribution curves to the empirical total encounter 
distributions for various traces. The parameters are listed in 
Table IIIII From the table we observe that the D-statistics are no 
larger than 0.05 except for UCSD trace, for which D-statistic 
is still reasonably low at 0.07, indicating a reasonable fit of 
the BiPareto distribution. 

In section IV-BI we use exponential distribution to fit the 
empirical distributions of friendship index. The CDF of expo- 
nential distribution is given by: 

F(x) = 1 - e~ Xx , x>0 

We list the A parameters we obtained using minimum 
squared error method to fit exponential distributions to the 
empirical distribution of friendship indexes based on encounter 
time in table llVI The corresponding D-statistics are also listed. 



TABLE III 

BiPareto distribution fitting to total encounter curves and 
d-statistics for k-s test 



Trace name 


BiPareto parameters 


D-statistics 


a 


n 


c 


MIT-rel 


0.027 


9.8 


4000 


0.036 


MIT-cons 


0.029 


3.0 


4500 


0.040 


UCSD 


0.062 


16.3 


9900 


0.068 


use 


0.019 


0.83 


550 


0.049 


Dart-03 


0.0723 


0.81 


290 


0.049 


Dart-04 


0.0285 


4.43 


11850 


0.025 


Dart-rel 


0.037 


7.46 


7200 


0.031 


Dart-cons 


0.037 


30.4 


30900 


0.024 



TABLE IV 

Exponential distribution fitting to friendship index based on 
encounter time curves and D-statistics for K-S test 



Trace name 


A 


D-statistics 


MIT-rel 


369.19 


0.0167 


use 


305.3 


0.0356 


Dart-03 


500.4 


0.0052 


Dart-04 


411.81 


0.0116 


Dart-rel 


409.91 


0.0120 


Dart-cons 


412.35 


0.0119 



Appendix B. Additional graphs for 
encounter-relationship graphs metrics 

In addition to the figures shown in section IV-CI we also 
obtain the same metrics for MIT-rel and Dart-03 traces. The 
corresponding figures are shown in Fig.[H)] displaying similar 
trends as discussed in section lV^Cl One interesting observation 
here is that for MIT trace, disconnected ratio is very high 
until day 3 in the trace. A further investigation reveals that 
MIT trace collection was started on a Saturday, and for a 
pure working environment Saturdays and Sundays are the least 
active days. The disconnected ratio is almost 100% until day 
3 because the MNs that are on during the weekend are mostly 
stationary ones. We observe a jump of number of node in the 
trace, a sudden decrease in DR, and an abrupt change in both 
CC and PL on day 3. 

Appendix C. Encounter- relationship graphs 
metrics based on various friendship indexes 

In this section we show the clustering coefficient, average 
path length, and disconnected ratio if we use different percent- 
age of friends, using friendship indexes based on encounter 
count and encounter location diversity, in the ER graph 
generated from USC trace. The definition of the friendship 
indexes and the metrics are given in section IV-BI and IV-DI 
respectively. 

Fig. [20| shows how the metrics change by including different 
percentage of friends based on encounter count. The trend is 
similar to what we observe in Fig. ^] where we use friends 
based on encounter time. Using top friends tend to lead to 
small cliques and hence high CC, PL, and DR in the generated 
ER graph. On the other hand, lower ranked friends represent 
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Fig. 19. Change in the ER graph metrics with respect to trace period 

the random links in ER graph, leading to low CC, PL, and 
DR. 

However, this trend is not clear if we use friends based 
on encounter location diversity. As shown in Fig. |^ the 
curves of using top, middle, or bottom ranked friends cross 
each other, and there is no clear trend as opposed to the 
graphs in which the other two friendship indexes are used. 
The reason is that since most MNs do not visit many APs, 
the friendship index based on encounter location diversity 
is a less effective way to distinguish the actual degree of 
friendship. On the extreme, for a MN that only visits one 
AP, all the encounters it has must occur at this AP, and the 
friendship index based on encounter location diversity is 1.0 
for all MNs it encounters. Therefore, picking top, middle, or 
bottom ranked friends would degenerate to randomly picking 
friends, resulting in less obvious trend in the ER graph metrics 
when we pick friend based on encounter location diversity. 
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Fig. 20. Metrics of ER graph by taking various percentage of friends based Fig. 21. Metrics of ER graph by taking various percentage of friends based 
on encounter count on encounter location diversity 



